Exploiting PageRank at Different Block Level
نویسندگان
چکیده
In recent years, information retrieval methods focusing on the link analysis have been developed; The PageRank and HITS are two typical ones According to the hierarchical organization of Web pages, we could partition the Web graph into blocks at different level, such as page level, directory level, host level and domain level. On the basis of block, we could analyze the different hyperlinks among pages. Several approaches proposed that the intrahyperlink in a host maybe less useful in computing the PageRank. However, there are no reports on how concretely the intraor inter-hyperlink affects the PageRank. Furthermore, based on different block level, inter-hyperlink and intra-hyperlink can be two relative concepts. Thus which level should be optimal to distinguish the intraor inter-hyperlink? And how the ratio set between the intra-hyperlink and inter-hyperlink could ultimately improve performance of the PageRank algorithm? In this paper, we analyze the link distribution at the different block level and evaluate the importance of the intraand interhyperlink to PageRank on the TREC Web Track data set. Experiment shows that, if we set the block at host level and the ratio of the weight between the intra-hyperlink and inter-hyperlink is 1:4, the retrieval could achieve the best performance.
منابع مشابه
Exploiting the Block Structure of the Web for Computing PageRank
The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3-stage algorithm whereby (1) the local PageRanks of pages for each host are computed independently using the link...
متن کاملBlock Models and Personalized PageRank
Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods through the "seed set expansion problem": given a subset [Formula: see text] of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the ob...
متن کاملPageRank in Undirected Random Graphs
PageRank has numerous applications in information retrieval, reputation systems, machine learning, and graph partitioning. In this paper, we study PageRank in undirected random graphs with an expansion property. The Chung-Lu random graph is an example of such a graph. We show that in the limit, as the size of the graph goes to infinity, PageRank can be approximated by a mixture of the restart d...
متن کاملOptimizing Personalized Retrieval System Based on Web Ranking
This paper drew up a personalized recommender system model combined the text categorization with the pagerank. The document or the page was considered in two sides: the content of the document and the domain it belonged to. The features were extracted in order to form the feature vector, which would be used in computing the difference between the documents or keywords with the user’s interests ...
متن کاملBlack hole metric: Overcoming the pagerank normalization problem
In network science, there is often the need to sort the graph nodes. While the sorting strategy may be different, in general sorting is performed by exploiting the network structure. In particular, the metric PageRank has been used in the past decade in different ways to produce a ranking based on how many neighbors point to a specific node. PageRank is simple, easy to compute and effective in ...
متن کامل